home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
TeX 1995 July
/
TeX CD-ROM July 1995 (Disc 1)(Walnut Creek)(1995).ISO
/
web
/
noweb
/
contrib
/
kostas
/
inw.nw
(
.txt
)
< prev
next >
Wrap
LaTeX Document
|
1995-02-24
|
6KB
|
107 lines
%-*- lang : icon -*-
\documentstyle [noweb,11pt,fullpage] {article}
\pagestyle {noweb}
\title {Extending Noweb With Some Typesetting (Icon version)}
\author {Kostas N. Oikonomou \\ {\tt ko@surya.ho.att.com}}
\begin {document}
\maketitle
\section {Introduction}
This is a {\tt noweb} filter, written in Icon, which adds to {\tt noweave} the capability
of some simple pretty-printing (no indentation or line-breaking!) in code sections. This
particular version implements pretty-printing for the Icon language. However, the filter is written in a way that should make it easy for someone to
change the target language: most of the code is language-independent (in \S\ref{lipp} and
on), while the language-dependent code occupies only sections 1 and 2. In fact, what
needs to be changed when the language changes is only the [[translation]] table in
procedure [[main]], and the definitions of the ``interesting'' tokens in the beginning of
\S\ref{int}\footnote{Well, hopefully, anyway.}.
Using this two-part scheme, the language-independent part (which is in file [[lipp.nw]])
has been used with the language-dependent files [[tnw.nw]], [[inw.nw]], and [[mnw.nw]] to
implement pretty-printing for the languages Object-Oriented Turing, Icon, and {\sl
Mathematica}.
<<*>>=
<<Procedure [[main]]>>
<<Procedure [[filter]]>>
<<Procedure [[TeXify]]>>
<<Global declarations>>
\section {A Typesetting Facility}
\subsection {The philosophy}
The addition to {\tt noweb} described here is based on the following two premises
\begin {itemize}
\item It should be as independent of the target language as possible, and
\item We don't want to write a full-blown scanner for the target language.
\end {itemize}
Strings of characters of the target language which we want to typeset specially are called
``interesting tokens''. Having had some experience with Web and SpiderWeb, we define three
categories of interesting tokens:
\begin {enumerate}
\item Reserved words of the target language: we want to typeset them in bold, say.
\item Other strings that we want to typeset specially: e.g. $\le$ for [[<=]].
\item Comment and quoting characters: we want what follows them or what is enclosed by
them to be typeset literally.
\end {enumerate}
There is a table [[translation]] which defines a translation into \TeX\ code for every
interesting token in the target language. Here is an excerpt from the translation table
for Icon:
\begin {center}
\begin {tabular}{l}
[[translation["by"] := "{\\Cb{}by}"]] \\
[[translation["break"] := "{\\Cb{}break}"]] \\
[[translation["&ascii"] := "{\\Cb{}&ascii}"]] \\
[[translation["&clock"] := "{\\Cb{}\&clock}"]] \\
[[translation[">="] := "$\\ge$"]] \\
[[translation["~="] := "$\\neq$"]]
\end {tabular}
\end {center}
(Here the control sequence \verb+\Cb+ selects the Courier bold font\footnote{The empty group
\{\} serves to separate the control sequence from its argument without introducing an
extra space.}.) We use four sets of strings to define the tokens in categories 2 and 3:
\begin {center}
[[special]], [[comment1]], [[comment2]], [[quote]].
\end {center}
[[comment1]] is for unbalanced comment strings (e.g.\ the character [[#]] in Icon),
[[comment2]] is for balanced comment strings (none in Icon), and [[quote]] is for literal
quotes ([["]] and [[']] in Icon), which we assume to be balanced.
Our approach to recognizing the interesting tokens while scanning a line, is to have a set
of characters [[begin_token]] (an Icon cset), containing all the characters by which an
interesting token may begin. [[begin_token]] is the union of
\begin {itemize}
\item the cset defining the characters which may begin a reserved word, and
\item the cset containing the initial characters of all strings in the special, comment,
and quote sets.
\end {itemize}
Given a line of text, we scan up to a character in [[begin_token]], and, depending on what
this character is, we may try to complete the token by further scanning. If we succeed, we
look up the token in the [[translation]] table, and if the token is found, we output its
translation, otherwise we output the token itself unchanged. When comment or quote tokens
are recognized, further processing of the line may stop altogether, or temporarily, until
a matching token is found.
<<Procedure [[main]]>>=
procedure main (args)
<<The [[translation]] table>>
<<Definition of interesting tokens>>
<<Emit special {\TeX} definitions>>
<<Read and filter all the input lines>>
\subsection {Definitions of the interesting tokens}
\label {int}
The set of characters allowed in an Icon identifier, of which a reserved word is a special
case:
<<Definition of interesting tokens>>=
res_word_chars := &letters ++ '&$'
id_chars := res_word_chars ++ &digits
@ Unbalanced and balanced comment tokens, and quoting tokens. Note that [[comment2]] is a
set of {\it pairs} (``open comment'', ``close comment''), while we assume that quoting
tokens are s.t. the ``open quote'' and ``close quote'' tokens are identical.
<<Definition of interesting tokens>>=
comment1 := set(["#"])
comment2 := set([])
quote := set(["\"", "\'"])
@ The ``special'' tokens. [[S]] is the set of all characters appearing in strings in
[[special]].
<<Definition of interesting tokens>>=
special := set(["{", "}", "\\", "||", "<", ">", ">=", "<=", "=>", "~=",
"++", "**", "--"])
S := ''
every S := S ++ !special # Nice!
The rest of the code is language-independent, and is in the file {\tt lipp.nw}
(Language-Independent Pretty-Printing).